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Summary  of  Progress 

Our  work  in  this  one-year  project  has  been  very  successful.  It  has  revolved  around  the  following 
three  important  areas  dealing  data  compression  and  its  applications: 

•  the  design  and  analysis  of  sophisticated  methods  for  prediction  based  on  data  compression 
techniques,  with  applications  to  prefetching,  caching,  and  locality  management. 

o  fast,  practical,  and  code-efficient  implementations  of  arithmetic  coding  and  other  coding 
methods,  for  use  in  text  and  image  compression, 

•  new  methods  for  choosing  motion  vectors  yielding  substantially  better  rate-distortion  tradeofis 
for  video  compression  in  videoconferencing  apphcations. 

Technology  Transfer 

Duke  University  recently  filed  a  patent  application  [15]  for  the  work  on  prediction. 

Optimal  Prediction  via  Data  Compression 

Caching  and  prefetching  are  important  mechanisms  for  speeding  up  access  time  to  data  on  secondary  , ' 
storage.  In  their  FOCS  ^91  paper,  Prof.  Vitter  and  graduate  student  P.  Krishnan  develop  an  optimal 
universal  prefetcher  in  terms  of  fault  ratio,  with  particular  applications  to  large-scale  databases  and 
hypertext  systems.  The  algorithms  are  novel  in  that  they  are  based  on  data  compression  techniques 
that  are  both  theoretically  optimal  and  good  in  practice.  They  show  for  powerful  models  such  as 
Markov  sources  and  mth-order  Markov  sources  that  the  page  fault  rates  are  optimal  in  the  hmit 
for  almost  all  sequences  of  page  accesses. 

An  important  issue  that  aftects  response  time  performance  in  current  OODB  and  hypertext 
systems  is  the  I/O  involved  in  moving  objects  from  slow  memory  to  cache.  A  promising  way  to 
tackle  this  problem  is  to  use  prefetching,  in  which  we  predict  the  user’s  next  page  requests  and  get 
those  pages  into  cache  in  the  background.  Current  databases  perform  Hmited  prefetching  using 
techniques  derived  from  older  virtual  memory  systems.  A  novel  idea  of  using  data  compression 
techniques  for  prefetching  was  recently  advocated  by  Vitter  in  Krishnan  in  which  the  prefetchers 
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based  on  the  Lempel-Ziv  data  compressor  (the  UNIX  compress  command)  were  shown  theoretically 
to  be  optimal  in  the  limit. 

In  our  work  [3],  we  analyze  the  practical  aspects  of  using  data  compression  techniques  for 
prefetching.  We  adapt  three  well-known  data  compressors  to  get  three  simple,  deterministic,  and 
universal  prefetchers.  We  simulate  our  prefetchers  on  sequences  of  page  accesses  derived  from  the 
001  and  007  benchmarks  and  fr'om  CAD  applications,  and  demonstrate  significant  reductions  in 
fault-rate.  We  examine  the  important  issues  of  cache  replacement,  size  of  the  data  structure  used 
by  the  prefetcher,  and  problems  arising  from  bursts  of  “fast”  page  requests  (that  leave  virtually  no 
time  between  adjacent  requests  for  prefetching  and  book  keeping).  We  conclude  that  prediction  for 
prefetching  based  on  data  compression  techniques  holds  great  promise. 

In  terms  of  mathematical  analysis  of  prefetching,  a  different  approach  than  that  used  for  caching 
must  be  followed.  Unlike  other  online  problems,  prefetching  cannot  admit  a  competitive  analysis, 
since  the  optimal  offline  prefetcher  incurs  no  cost  when  it  knows  the  future  page  requests.  Pre¬ 
vious  analytical  work  on  prefetching  by  Vitter  and  Krishnan  consisted  of  modeling  the  user  as  a 
probabilistic  Markov  source. 

In  [14]  we  look  at  the  much  stronger  form  of  worst-case  analysis  and  derive  a  randomized 
algorithm  that  we  prove  analytically  converges  almost  surely  to  the  optimal  fault  rate  in  the  worst 
case  for  every  sequence  of  page  requests  with  respect  to  the  important  cla^s  of  finite  state  prefetchers. 

In  particular,  we  make  no  assumption  about  how  the  sequence  of  page  requests  is  generated.  This 
analysis  model  can  be  looked  upon  as  a  generalization  of  the  competitive  framework,  in  that  it 
compares  an  online  algorithm  in  a  worst-case  manner  over  all  sequences  against  a  powerful  yet 
non-clairvoyant  opponent.  We  simultaneously  achieve  the  computational  goal  of  implementing 
our  prefetcher  in  optimal  constant  expected  time  per  prefetched  page,  using  the  optimal  dynamic 
discrete  random  variate  generator  of  Matias,  Vitter,  and  Ni  in  Proceedings  of  the  4^h  Annual 
SIAM/ACM  Symposium  on  Discrete  Algorithms ,  Austin,  TX,  January  1993,  361-370. 

We  ai’e  currently  looking  at  applications  of  these  prediction  techniques  to  other  locality  man¬ 
agement  applications,  such  as  power  management  for  mobile  computers. 

Arithmetic  Coding,  Text  Compression,  and  Image  Compression 

Arithmetic  coding,  in  conjimction  with  a  suitable  probabilistic  model,  can  provide  nearly  optimal 
data  compression.  In  [5]  analysis  of  arithmetic  coding  ,],  Prof.  Vitter  and  Paul  Howard  (formerly 
graduate  student  and  postdoctoral  assistant)  analyze  the  effect  that  the  model  and  the  particular 
implementation  of  arithmetic  coding  have  on  the  code  length  obtained.  They  show  that  adaptive 
models  give  the  same  code  length  as  semi-adaptive  decrementing  models.  Periodic  scaling  is  often 
used  in  arithmetic  coding  implementations  to  reduce  time  and  storage  requirements;  it  also  intro¬ 
duces  a  recency  effect  that  can  further  affect  compression.  They  introduce  the  notion  of  “weighted  ' ' 
entropy”  and  use  it  to  characterize  in  an  elegant  way  the  effect  that  periodic  scaling  has  on  the 
code  length.  They  also  give  a  rigorous  proof  that  the  coding  effects  of  rounding  scaled  weights, 
using  integer  arithmetic,  and  encoding  end-of-file  are  negligible. 

Prof.  Vitter  and  Paul  Howard  in  [13,  12]  provide  the  basis  of  a  fast,  space-efficient,  approximate 
arithmetic  coder  with  only  minimal  loss  of  compression  efficiency.  The  coder,  called  quasi- arithmetic 
coding^  is  based  on  the  replacement  of  arithmetic  coding  by  fast,  carefully  arranged  table  lookups, 
coupled  with  a  new  deterministic  probability  estimation  scheme. 

They  give  a  detailed  algorithm  for  fast  text  compression  in  [13]  based  partly  on  quasi-arithmetic 
coding.  The  algorithm,  related  to  the  PPM  methods,  which  are  the  state-of-the-art  methods  for 
maximum  text  compression,  simplifies  the  modeling  phase  by  eliminating  the  escape  mechanism, 
and  speeds  up  coding  by  using  a  combination  of  quasi-arithmetic  coding  and  Rice  coding.  They 
provide  details  of  the  use  of  quasi-arithmetic  code  tables,  and  analyze  their  compression  perfor- 
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mance.  Oui'  Fast  PPM  method  is  shown  experimentally  to  be  almost  twice  as  fast  as  the  PPMC 
method,  while  giving  comparable  compression. 

In  [9]  Prof.  Vitter  and  Paul  Howard  give  a  new  paradigm  for  lossless  image  compression,  with 
four  modular  components:  pixel  sequence,  prediction,  error  modeling,  and  coding.  They  present 
two  new  methods  (called  MLP  and  PPPM)  for  lossless  compression,  both  involving  linear  predic¬ 
tion,  modeling  prediction  errors  by  estimating  the  variance  of  a  Laplace  distribution,  and  coding 
using  arithmetic  coding  applied  to  precomputed  distributions.  MLP  is  both  progressive  and  par- 
allelizable.  The  new  methods  compress  high-resolution  images  significantly  better  than  do  other 
lossless  methods,  including  the  lossless  mode  of  JPEG. 

MLP  compression  is  improved  considerably  in  [8]  by  a  preliminary  method  for  error  modeling 
using  the  variability  index^  which  provides  accurate  models  for  pixel  prediction  errors  without 
requiring  explicit  transmission  of  the  models.  The  variability  index  can  also  be  used  to  show  that 
prediction  errors  do  not  always  follow  the  Laplace  distribution,  as  is  commonly  assumed;  replacing 
the  Laplace  distribution  with  a  more  general  distribution  can  improve  compression.  A  fast  PRAM 
implementation  of  MLP  is  proposed  in  [7]  by  the  same  authors. 

Prof.  Vitter  and  Paul  Howard  used  the  results  of  their  modeling  studies  with  the  MLP  method  to 
develop  an  extremely  fast  lossless  image  compressor  called  FELICS,  which  compresses  as  eflBciently 
as  the  JPEG  lossless  mode  and  runs  five  times  faster  [10]  Work  is  proceeding  on  a  progressive  variant 
based  on  a  hierarchical  pixel  sequence  [11]  The  same  authors  applied  their  modeling  expertise  to 
address  the  problem  of  speed  of  the  PPM  family  of  text  compressors,  which  are  currently  the  most 
effective  methods  in  terms  of  the  amoimt  of  compression  achieved.  The  Ziv-Lempel  methods  (such 
as  the  UNIX  compress  command)  are  much  faster  but  do  not  compress  as  weU.  The  new  PPMD 
method  offers  state-of-the-art  compression  at  double  the  speed  of  previous  PPM  methods  [12]. 

Video  Compression  via  Motion  Compensation 

In  [4]  we  compare  methods  for  choosing  motion  vectors  for  motion-compensated  video  compression. 
Our  primary  focus  is  on  videophone  and  videoconferencing  applications,  where  very  low  bit  rates 
are  necessary,  where  the  motion  is  usually  limited,  and  where  the  frames  must  be  coded  in  the 
order  they  are  generated.  We  provide  evidence,  using  established  benchmark  videos  of  this  type, 
that  choosing  motion  vectors  to  minimize  codelength  subject  to  (implicit)  constraints  on  quality 
yields  substantially  better  rate-distortion  tradeoffs  than  minimizing  notions  of  prediction  error.  We 
illustrate  this  point  using  an  algorithm  within  the  p  x  64  standard.  We  show  that  using  quadtrees 
to  code  the  motion  vectors  in  conjunction  with  explicit  codelength  minimization  yields  further 
improvement.  We  describe  a  dynaniic-programniing  algorithm  for  choosing  a  quadtree  to  minimize 
the  codelength.  Current  research  is  aimed  at  heuristics  for  speeding  up  the  processing  time  and 
use  of  similar  ideas  for  gaining  improvements  in  static  image  compression. 

Machine  Learning 

In  [1]  we  introduce  a  new  technique  which  enables  a  learner  without  access  to  hidden  information 
to  learn  nearly  as  well  as  a  learner  with  access  to  hidden  information.  We  apply  our  technique  to 
solve  an  open  problem  of  Maass  and  Turan.  We  describe  analogous  results  for  two  generalizations 
of  this  model  to  function  learning,  and  apply  those  results  to  bound  the  difficulty  of  learning  in 
the  harder  of  these  models  in  terms  of  the  difficulty  of  learning  in  the  easier  model.  We  bound  the 
difficulty  of  learning  unions  of  k  concepts  firom  a  class  jP  in  terms  of  the  difficulty  of  learning  F,  We 
bound  the  difficulty  of  learning  in  a  noisy  environment  for  deterministic  algorithms  in  terms  of  the 
difficulty  of  learning  in  a  noise-firee  environment.  We  apply  a  variant  of  our  technique  to  develop 
an  algorithm  transformation  that  allows  probabilistic  learning  algorithms  to  nearly  optimally  cope 
with  noise.  A  second  variant  enables  us  to  improve  a  general  lower  bound  of  Turan  for  the  PAC- 
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learning  model  (with  queries).  Finally,  we  show  that  logarithmically  many  membership  queries 
never  help  to  obtain  computationally  efficient  learning  algorithms. 

In  [2],  we  consider  the  problem  of  learning  real- valued  functions  from  random  examples  when 
the  function  values  are  corrupted  with  noise.  With  mild  conditions  on  independent  observation 
noise,  we  provide  chai'acterizations  of  the  learnability  of  a  real-valued  function  class  in  terms  of 
a  generalization  of  the  Vapnik-Chervonenkis  dimension,  the  fat-shattering  function,  introduced 
by  Kearns  and  Schapire.  We  show  that,  given  some  restrictions  on  the  noise,  a  function  class  is 
learnable  in  our  model  if  and  only  if  its  fat-shattering  function  is  finite.  With  different  (also  quite 
mild)  restrictions,  satisfied  for  example  by  gaussian  noise,  we  show  that  a  function  class  is  learnable 
from  polynomially  many  examples  if  and  only  if  its  fat-shattering  function  grows  polynomially.  We 
prove  analogous  results  in  an  agnostic  setting,  where  there  is  no  assumption  of  an  imderlying 
function  class. 
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—  “Models  for  Parallel  Secondary  and  Hieraxchical  Storage,”  Workshop  on  Models,  Archi¬ 
tectures,  and  Technologies  for  Parallel  Computation,  DIMACS,  Rutgers  University,  New 
Brunswick,  NJ. 

-  “Load  Balancing  Paradigms  for  Optimal  Use  of  Parallel  Disks  and  Parallel  Memory 
Hierarchies,”  Stanford  University,  Stanford,  CA. 

—  “Optimal  Prediction  via  Data  Compression,”  University  of  Texas  at  Dallas,  Dallas,  TX. 

-  “Load  Balancing  Paradigms  for  Optimal  Use  of  Parallel  Disks  and  Parallel  Memory  Hi¬ 
erarchies,”  Keynote  address  at  Workshop  on  Algorithmic  Research  in  the  Midsouthwest 
(WARM  93),  University  of  North  Texas,  Denton,  TX. 

—  "Predictive  Techniques  for  Caching  and  Locality  Management,”  Microsoft  Corporation, 
Redmond,  WA. 

—  “EiScient  Processing  of  Large-Scale  Data,”  Mathematisches  Forschungsinstitut  Ober- 
wolfach,  Germany. 

-  “How  to  Predict  Well,”  Tulane  University,  New  Orleans,  LA. 

-  “How  to  Predict  Well,”  Supercomputing  Research  Center,  Bowie,  MD. 

Dr.  Paul  G.  Howai'd: 

•  nominee  of  the  Department  of  Computer  Science  at  Brown  University  for  the  ACM  Doctoral 
Dissertation  Award,  based  on  work  supported  by  this  grant. 

Prof.  Philip  M.  Long: 

•  Invited  talks: 

—  “Simulating  Access  to  Hidden  Information  while  Learning,”  Australian  national  Univer-  \ 
sity  Systems  Engineering  Seminar. 


6 


^ukc  ^xtikerstto 

DURHAM 
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DEPARTMENT  OF  COMPUTER  SCIENCE  (919)  660-6500 

BOX  90129 


December  19, 1994 


Major  David  r.  Luginbuhl 
AFOSR/NM 

110  Duncan  Avenue,  Suite  B 1 15 
Bolling  Air  Force  Base 
Washington,  DC  20332-0001 

RE:  AFOSR  Grant  No.  F49620-92-J-0515 

Dear  Major  Luginbuhl: 

Per  your  conversation  with  Dr.  Jeffrey  Vitter,  enclosed  is  the  final  project  report 
for  the  above  referenced  grant.  We  apologize  for  the  delay  of  this  report  and  hope  that  it 
has  not  caused  any  inconvenience. 

If  you  have  any  questions  or  concerns,  please  do  not  hesitate  to  contact  us. 


Sincerely, 


C^^^ina  Gaither 

Grants/C^&htracts  Specialist 


Enclosure 


cc:  E.  Chamiak 
L.  Rossi 


